Character Identification on Multiparty Conversation: Identifying Mentions of Characters in TV Shows
نویسندگان
چکیده
This paper introduces a subtask of entity linking, called character identification, that maps mentions in multiparty conversation to their referent characters. Transcripts of TV shows are collected as the sources of our corpus and automatically annotated with mentions by linguistically-motivated rules. These mentions are manually linked to their referents through crowdsourcing. Our corpus comprises 543 scenes from two TV shows, and shows the inter-annotator agreement of κ = 79.96. For statistical modeling, this task is reformulated as coreference resolution, and experimented with a state-of-the-art system on our corpus. Our best model gives a purity score of 69.21 on average, which is promising given the challenging nature of this task and our corpus.
منابع مشابه
Evaluating Deep Learning Approaches for Character Identification in Multiparty Dialogues
Character identification is an entity linking task that identifies each mention as a certain character in multiparty dialogue where mentions are typically nominals referring to a person and entities maybe speakers themselves or even external characters. Identifying such mentions as real characters requires cross-document entity resolution, which makes this task challenging. This task involves c...
متن کاملRobust Coreference Resolution and Entity Linking on Dialogues: Character Identification on TV Show Transcripts
This paper presents a novel approach to character identification, that is an entity linking task that maps mentions to characters in dialogues from TV show transcripts. We first augment and correct several cases of annotation errors in an existing corpus so the corpus is clearer and cleaner for statistical learning. We also introduce the agglomerative convolutional neural network that takes gro...
متن کاملشناسایی پلاک خودروهای ایرانی با الگوریتم ماشین بردار پشتیبانی فازی
License plate recognition is one of the most important applications used in intelligent transportation systems. Difficulty of correct detection and identification of the car plates in different environment conditions makes researchers try new approaches to better solve the problem. License plate recognition problem is divided into three sub problems: "Plate Location", "Character Segmentation", ...
متن کاملHello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video
We investigate the problem of automatically labelling appearances of characters in TV or film material. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The prin...
متن کاملMultimodal Subjectivity Analysis of Multiparty Conversation
We investigate the combination of several sources of information for the purpose of subjectivity recognition and polarity classification in meetings. We focus on features from two modalities, transcribed words and acoustics, and we compare the performance of three different textual representations: words, characters, and phonemes. Our experiments show that character-level features outperform wo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016